A growing body of literature suggests that a wide array of lifestyle activities are associated with better cognitive function and reduced risk for age-related neurodegenerative disorders.
However, a great deal of studies on lifestyle and cognition primarily involve an elderly population in a controlled laboratory setting.
Use of Internet- and mobile application-based technology for data collection (Killingsworth & Gilbert, 2010; Lee et al., 2012; Nosek et al., 2009) has allowed researchers to collect a large amount of data with cultural and regional diversity.
With data collected from the iPad application (BrainBaseline), we examined the association between various lifestyle activities and cognitive fucntion across adult lifespan.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pylab as pl
from collections import Counter
from scipy import stats
from IPython.core.display import Image
sns.set_style("whitegrid")
sns.set_context("talk")
%pylab inline
Image('Brainbaseline/APS_2014_Lee.png')
Image('Brainbaseline/bb.png')
We performed a Principal Components Analysis (PCA) on survey information and categorized lifestyle factors into three categories: physical activity, leisure activity, and socioeconomic status.
Let's load data with lifestyle composite scores and bin it by age range.
bb = pd.read_csv("Brainbaseline/bb_all.csv", sep=",", skipinitialspace=True)
bb = bb.dropna(how='all')
bb = bb[['age', 'ageBin', 'exerciseScore', 'leisureScore', 'socioEconomicScore', 'memoryComposite', 'processingSpeedComposite_r']]
bb = bb[(bb["age"] >= 20) & (bb["age"] < 80)]
ageBin = bb['age']//10
ageBin[ageBin>6] = 6
bb['ageBin'] = ageBin
We will also bin leisure, exercise and socioeconomic scores in each age bin for later analysis.
whichScore = ['leisureScore', 'exerciseScore', 'socioEconomicScore']
age_dict = {i: bb[bb['ageBin'] == i] for i in range(2,7)}
def binning (score, field):
median = score[field].dropna().quantile(.50)
#cats = pd.qcut(data, 4)
score[field+'_Bin'] = score[field].dropna().map(lambda x: 'high' if x > median else 'low')
for s in whichScore:
for i in range(2, 7):
binning(age_dict[i], s)
bb_new = pd.DataFrame()
for i in range(2, 7):
bb_new = bb_new.append(age_dict[i])
print(Counter(bb_new['leisureScore_Bin']))
print(Counter(bb_new['exerciseScore_Bin']))
print(Counter(bb_new['socioEconomicScore_Bin']))
Rename columns with more intuitive names.
bb_new.rename(columns={'leisureScore_Bin': 'leisureBin', 'exerciseScore_Bin': 'exerciseBin', 'socioEconomicScore_Bin': 'socioEconomicBin', 'memoryComposite': 'memory', 'processingSpeedComposite_r': 'processingSpeed'}, inplace=True)
Let's view the first five participants to check what we have in our dataframe.
bb_new.head()
We have about 24,000 users' data. Let's check age distribution.
plt.hist(bb_new['age']);
plt.xlabel('Age')
plt.ylabel('Frequency')
Then, replicate age-related cognitive decline in each cognitive function.
sns.factorplot("ageBin", "memory", data=bb_new)
sns.factorplot("ageBin", "processingSpeed", data=bb_new);
And then regress the lifestyle activities to cognitive function, after controlling the age effect. There are positive linear patterns except exercise score on memory function.
f1, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
sns.regplot("leisureScore", "memory", bb_new, x_partial="age", order = 1, ax=ax1)
sns.regplot("exerciseScore", "memory", bb_new, x_partial="age", order = 1, ax=ax2).set_ylabel('')
sns.regplot("socioEconomicScore", "memory", bb_new, x_partial="age", order = 1, ax=ax3).set_ylabel('')
ax3.set(xlim=(0, 10), ylim=(-10, 5));
f1.tight_layout()
f2, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
sns.regplot("leisureScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax1)
sns.regplot("exerciseScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax2).set_ylabel('')
sns.regplot("socioEconomicScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax3).set_ylabel('')
ax3.set(xlim=(0, 10), ylim=(-10, 5));
f2.tight_layout()
When exmained the lifestyle activity effect on cognitive function in each age bin, we see generally similar patterns regardless of age range.
sns.factorplot("ageBin", "memory", "leisureBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "memory", "exerciseBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "memory", "socioEconomicBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "processingSpeed", "leisureBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "processingSpeed", "exerciseBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "processingSpeed", "socioEconomicBin", bb_new, kind="point", palette="Set1");
There is additive benefit on cognitive function with diverse lifestyle activities (activity level is calculated by summing three lifestyle, with 1 for high activity 0 for low activity).
bb_new['activityLevel'] = bb_new['exerciseBin'].map(lambda x: 1 if x =='high' else 0) + bb_new['leisureBin'].map(lambda x: 1 if x =='high' else 0) + bb_new['socioEconomicBin'].map(lambda x: 1 if x =='high' else 0)
sns.factorplot("activityLevel", "memory", data=bb_new);
sns.factorplot("activityLevel", "memory", 'ageBin', data=bb_new);
sns.factorplot("activityLevel", "processingSpeed", data=bb_new);
sns.factorplot("activityLevel", "processingSpeed", 'ageBin', data=bb_new);
And there are interactions between lifestyle activites.
sns.factorplot("socioEconomicBin", "memory", "leisureBin", bb_new, kind="bar", palette="Set1");
sns.factorplot("socioEconomicBin", "processingSpeed", "exerciseBin", bb_new, kind="bar", palette="Set1");
The benefit of leisure and exercise activities is more salient in low socioeconomic status group.